Goto

Collaborating Authors

 León


A visual big data system for the prediction of weather-related variables: Jordan-Spain case study

Aljawarneh, Shadi, Lara, Juan A., Yassein, Muneer Bani

arXiv.org Artificial Intelligence

The Meteorology is a field where huge amounts of data are generated, mainly collected by sensors at weather stations, where different variables can be measured. Those data have some particularities such as high volume and dimensionality, the frequent existence of missing values in some stations, and the high correlation between collected variables. In this regard, it is crucial to make use of Big Data and Data Mining techniques to deal with those data and extract useful knowledge from them that can be used, for instance, to predict weather phenomena. In this paper, we propose a visual big data system that is designed to deal with high amounts of weather-related data and lets the user analyze those data to perform predictive tasks over the considered variables (temperature and rainfall). The proposed system collects open data and loads them onto a local NoSQL database fusing them at different levels of temporal and spatial aggregation in order to perform a predictive analysis using univariate and multivariate approaches as well as forecasting based on training data from neighbor stations in cases with high rates of missing values. The system has been assessed in terms of usability and predictive performance, obtaining an overall normalized mean squared error value of 0.00013, and an overall directional symmetry value of nearly 0.84. Our system has been rated positively by a group of experts in the area (all aspects of the system except graphic desing were rated 3 or above in a 1-5 scale). The promising preliminary results obtained demonstrate the validity of our system and invite us to keep working on this area.


Vacuum Spiker: A Spiking Neural Network-Based Model for Efficient Anomaly Detection in Time Series

Vázquez, Iago Xabier, Sedano, Javier, Afzal, Muhammad, García-Vico, Ángel Miguel

arXiv.org Artificial Intelligence

Anomaly detection is a key task across domains such as industry, healthcare, and cybersecurity. Many real-world anomaly detection problems involve analyzing multiple features over time, making time series analysis a natural approach for such problems. While deep learning models have achieved strong performance in this field, their trend to exhibit high energy consumption limits their deployment in resource-constrained environments such as IoT devices, edge computing platforms, and wearables. To address this challenge, this paper introduces the \textit{Vacuum Spiker algorithm}, a novel Spiking Neural Network-based method for anomaly detection in time series. It incorporates a new detection criterion that relies on global changes in neural activity rather than reconstruction or prediction error. It is trained using Spike Time-Dependent Plasticity in a novel way, intended to induce changes in neural activity when anomalies occur. A new efficient encoding scheme is also proposed, which discretizes the input space into non-overlapping intervals, assigning each to a single neuron. This strategy encodes information with a single spike per time step, improving energy efficiency compared to conventional encoding methods. Experimental results on publicly available datasets show that the proposed algorithm achieves competitive performance while significantly reducing energy consumption, compared to a wide set of deep learning and machine learning baselines. Furthermore, its practical utility is validated in a real-world case study, where the model successfully identifies power curtailment events in a solar inverter. These results highlight its potential for sustainable and efficient anomaly detection.


Anomaly detection in network flows using unsupervised online machine learning

Miguel-Diez, Alberto, Campazas-Vega, Adrián, Guerrero-Higueras, Ángel Manuel, Álvarez-Aparicio, Claudia, Matellán-Olivera, Vicente

arXiv.org Artificial Intelligence

Nowadays, the volume of network traffic continues to grow, along with the frequency and sophistication of attacks. This scenario highlights the need for solutions capable of continuously adapting, since network behavior is dynamic and changes over time. This work presents an anomaly detection model for network flows using unsupervised machine learning with online learning capabilities. This approach allows the system to dynamically learn the normal behavior of the network and detect deviations without requiring labeled data, which is particularly useful in real-world environments where traffic is constantly changing and labeled data is scarce. The model was implemented using the River library with a One-Class SVM and evaluated on the NF-UNSW-NB15 dataset and its extended version v2, which contain network flows labeled with different attack categories. The results show an accuracy above 98%, a false positive rate below 3.1%, and a recall of 100% in the most advanced version of the dataset. In addition, the low processing time per flow (<0.033 ms) demonstrates the feasibility of the approach for real-time applications.


A XAI-based Framework for Frequency Subband Characterization of Cough Spectrograms in Chronic Respiratory Disease

Amado-Caballero, Patricia, San-José-Revuelta, Luis M., Wang, Xinheng, Garmendia-Leiza, José Ramón, Alberola-López, Carlos, Casaseca-de-la-Higuera, Pablo

arXiv.org Artificial Intelligence

This paper presents an explainable artificial intelligence (XAI)-based framework for the spectral analysis of cough sounds associated with chronic respiratory diseases, with a particular focus on Chronic Obstructive Pulmonary Disease (COPD). A Convolutional Neural Network (CNN) is trained on time-frequency representations of cough signals, and occlusion maps are used to identify diagnostically relevant regions within the spectrograms. These highlighted areas are subsequently decomposed into five frequency subbands, enabling targeted spectral feature extraction and analysis. The results reveal that spectral patterns differ across subbands and disease groups, uncovering complementary and compensatory trends across the frequency spectrum. Noteworthy, the approach distinguishes COPD from other respiratory conditions, and chronic from non-chronic patient groups, based on interpretable spectral markers. These findings provide insight into the underlying pathophysiological characteristics of cough acoustics and demonstrate the value of frequency-resolved, XAI-enhanced analysis for biomedical signal interpretation and translational respiratory disease diagnostics.


XAI-Driven Spectral Analysis of Cough Sounds for Respiratory Disease Characterization

Amado-Caballero, Patricia, San-José-Revuelta, Luis Miguel, Aguilar-García, María Dolores, Garmendia-Leiza, José Ramón, Alberola-López, Carlos, Casaseca-de-la-Higuera, Pablo

arXiv.org Artificial Intelligence

This paper proposes an eXplainable Artificial Intelligence (XAI)-driven methodology to enhance the understanding of cough sound analysis for respiratory disease management. We employ occlusion maps to highlight relevant spectral regions in cough spectrograms processed by a Convolutional Neural Network (CNN). Subsequently, spectral analysis of spectrograms weighted by these occlusion maps reveals significant differences between disease groups, particularly in patients with COPD, where cough patterns appear more variable in the identified spectral regions of interest. This contrasts with the lack of significant differences observed when analyzing raw spectrograms. The proposed approach extracts and analyzes several spectral features, demonstrating the potential of XAI techniques to uncover disease-specific acoustic signatures and improve the diagnostic capabilities of cough sound analysis by providing more interpretable results.


Co-Writing with AI, on Human Terms: Aligning Research with User Demands Across the Writing Process

Reza, Mohi, Thomas-Mitchell, Jeb, Dushniku, Peter, Laundry, Nathan, Williams, Joseph Jay, Kuzminykh, Anastasia

arXiv.org Artificial Intelligence

As generative AI tools like ChatGPT become integral to everyday writing, critical questions arise about how to preserve writers' sense of agency and ownership when using these tools. Yet, a systematic understanding of how AI assistance affects different aspects of the writing process - and how this shapes writers' agency - remains underexplored. To address this gap, we conducted a systematic review of 109 HCI papers using the PRISMA approach. From this literature, we identify four overarching design strategies for AI writing support: structured guidance, guided exploration, active co-writing, and critical feedback - mapped across the four key cognitive processes in writing: planning, translating, reviewing, and monitoring. We complement this analysis with interviews of 15 writers across diverse domains. Our findings reveal that writers' desired levels of AI intervention vary across the writing process: content-focused writers (e.g., academics) prioritize ownership during planning, while form-focused writers (e.g., creatives) value control over translating and reviewing. Writers' preferences are also shaped by contextual goals, values, and notions of originality and authorship. By examining when ownership matters, what writers want to own, and how AI interactions shape agency, we surface both alignment and gaps between research and user needs. Our findings offer actionable design guidance for developing human-centered writing tools for co-writing with AI, on human terms.


Predicting fall risk in older adults: A machine learning comparison of accelerometric and non-accelerometric factors

González-Castro, Ana, Benítez-Andrades, José Alberto, González-González, Rubén, Prada-García, Camino, Leirós-Rodríguez, Raquel

arXiv.org Artificial Intelligence

Objectives: Accurate prediction of fall risk in older adults is essential to prevent injuries and improve quality of life. This study evaluates the predictive performance of various machine learning models using accelerometric data, non-accelerometric data, aiming to improve predictive accuracy and identify key contributing variable. Methods: We applied random forest, XGBoost, AdaBoost, LightGBM, support vector regression (SVR), decision trees, and Bayesian ridge regression to a dataset of 146 older adults. Models were trained using accelerometric data (movement patterns) and non-accelerometric data (demographic and clinical variables). Results: Models trained on combined accelerometric and non-accelerometric data consistently outperformed those based on single data types. Bayesian ridge regression achieved the highest accuracy (MSE = 0.6746, R Non-accelerometric factors, including age and comorbidities, signi ficantly contributed to fall risk prediction. Conclusions: Integrating accelerometric and non-accelerometric data improves fall risk prediction accuracy in older adults. Bayesian ridge regression trained on combined datasets provides superior predictive power compared to traditional models. Future work should validate these models in larger, more diverse populations to enhance clinical applicability. HEALTH Volume 11: 1 - 16 DOI: 10.1177/20552076251331752 Introduction and related work Background on fall risk Falls among older adults are a major health concern, with one-third experiencing falls annually, and up to 20% resulting in serious injuries such as fractures or head trauma. This problem is compounded by an aging population and places a significant economic burden on healthcare systems, exceeding 2 billion dollars annually in countries like Canada. Beyond physical injuries, falls reduce functional independence and quality of life. They often lead to prolonged hospitalizations, institutionalization, and increased mortality. Additionally, the fear of falling can discourage physical activity, creating a cycle of physical decline that further elevates fall risk. The fi nancial burden of falls is expected to increase as populations age, reinforcing the urgent need for effective fall prevention and improved risk prediction methods to mitigate both health and economic consequences.


EsBBQ and CaBBQ: The Spanish and Catalan Bias Benchmarks for Question Answering

Ruiz-Fernández, Valle, Mina, Mario, Falcão, Júlia, Vasquez-Reina, Luis, Sallés, Anna, Gonzalez-Agirre, Aitor, Perez-de-Viñaspre, Olatz

arXiv.org Artificial Intelligence

Previous literature has largely shown that Large Language Models (LLMs) perpetuate social biases learnt from their pre-training data. Given the notable lack of resources for social bias evaluation in languages other than English, and for social contexts outside of the United States, this paper introduces the Spanish and the Catalan Bias Benchmarks for Question Answering (EsBBQ and CaBBQ). Based on the original BBQ, these two parallel datasets are designed to assess social bias across 10 categories using a multiple-choice QA setting, now adapted to the Spanish and Catalan languages and to the social context of Spain. We report evaluation results on different LLMs, factoring in model family, size and variant. Our results show that models tend to fail to choose the correct answer in ambiguous scenarios, and that high QA accuracy often correlates with greater reliance on social biases.


Applying XAI based unsupervised knowledge discovering for Operation modes in a WWTP. A real case: AQUAVALL WWTP

Beneyto-Rodriguez, Alicia, Sainz-Palmero, Gregorio I., Galende-Hernández, Marta, Fuente, María J., Cuenca, José M.

arXiv.org Artificial Intelligence

Water reuse is a key point when fresh water is a commodity in ever greater demand, but which is also becoming ever more available. Furthermore, the return of clean water to its natural environment is also mandatory. Therefore, wastewater treatment plants (WWTPs) are essential in any policy focused on these serious challenges. WWTPs are complex facilities which need to operate at their best to achieve their goals. Nowadays, they are largely monitored, generating large databases of historical data concerning their functioning over time. All this implies a large amount of embedded information which is not usually easy for plant managers to assimilate, correlate and understand; in other words, for them to know the global operation of the plant at any given time. At this point, the intelligent and Machine Learning (ML) approaches can give support for that need, managing all the data and translating them into manageable, interpretable and explainable knowledge about how the WWTP plant is operating at a glance. Here, an eXplainable Artificial Intelligence (XAI) based methodology is proposed and tested for a real WWTP, in order to extract explainable service knowledge concerning the operation modes of the WWTP managed by AQUAVALL, which is the public service in charge of the integral water cycle in the City Council of Valladolid (Castilla y León, Spain). By applying well-known approaches of XAI and ML focused on the challenge of WWTP, it has been possible to summarize a large number of historical databases through a few explained operation modes of the plant in a low-dimensional data space, showing the variables and facility units involved in each case.


Enhanced prediction of spine surgery outcomes using advanced machine learning techniques and oversampling methods

Benítez-Andrades, José Alberto, Prada-García, Camino, Ordás-Reyes, Nicolás, Blanco, Marta Esteban, Merayo, Alicia, Serrano-García, Antonio

arXiv.org Artificial Intelligence

The study proposes an advanced machine learning approach to predict spine surgery outcomes by incorporating oversampling techniques and grid search optimization. A variety of models including GaussianNB, ComplementNB, KNN, Decision Tree, and optimized versions with RandomOverSampler and SMOTE were tested on a dataset of 244 patients, which included pre-surgical, psychometric, socioeconomic, and analytical variables. The enhanced KNN models achieved up to 76% accuracy and a 67% F1-score, while grid-search optimization further improved performance. The findings underscore the potential of these advanced techniques to aid healthcare professionals in decision-making, with future research needed to refine these models on larger and more diverse datasets.